Data Centric AI

Charles F Vardeman II

2023-09-05

Trusted AI Frameworks for Knowledge Engineering

Foundational Components for Trusted AI

  • Automate the integration and deployment of code, ensuring quality and operational efficiency.
  • Standardized Development Environments: Establish consistent, easily replicable environments to accelerate development and experimentation.
  • Data & Experiment Versioning: Implement robust systems to track changes in data and experiments, allowing for traceability and repeatability.
  • Model Lifecycle Management: Streamline the training, deployment, monitoring, and updating of machine learning models.
  • Flexibility Across Layers: Design the architecture to allow for different levels of customization, from high-level APIs to low-level controls, facilitating adaptability.

Data Centric AI

Aside: Lessons from the Semantic Web?

DVC and Data Centric AI?

Data-centric AI is an emerging concept that emphasizes the importance of data quality and data engineering in building AI systems. Data-centric AI aims to improve the performance and robustness of AI models by systematically characterizing, evaluating, and monitoring the underlying data used to train and evaluate them⁴. Data-centric AI also involves using data-driven methods and tools to inform the considerations at each stage of the ML pipeline⁴.

One of the tools that can help with data-centric AI is Data Version Control (DVC), which is a system for versioning machine learning models, data sets, and intermediate files. DVC connects them with code, and uses various storage options to store file contents³. DVC allows users to track and reproduce the experiments, share data and models, and collaborate effectively on AI projects³.

If you want to learn more about data-centric AI, you can check out some of the web search results I found for you. For example, you can read a survey paper that discusses the necessity, goals, methods, challenges, and benchmarks of data-centric AI¹. You can also watch a video lecture by Andrew Ng, who popularized the term data-centric AI⁶. Or you can explore a website that provides a checklist and resources for applying data-centric AI in practice⁴. I hope this helps you understand what data-centric AI is and how DVC can be useful for it. 😊

Data Version Control (DVC)

Hugging Face

AI Testimony before US Senate

Celent Delangue Senate Statement

DVC and Huggingface Integration (Team Frameworks – Peter)

JSON-LD Model and “AI Based Microservices”

Motivation…

How do we develop a curriculum for training large language models?

The “Pile”

LLama: Open and Efficient Foundation Language Models

(GPT-4) “Sparks of AGI”?

Textbooks Are All You Need!

Textbooks Are All You Need!

Textbooks are all you need II: phi-1.5

Y. Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y. T. Lee, “Textbooks Are All You Need II: phi-1.5 technical report.” arXiv, Sep. 11, 2023. Accessed: Sep. 12, 2023. [Online]. Available: http://arxiv.org/abs/2309.05463

Textbooks are all you need II: phi-1.5

Microsoft/phi-1_5

phi-1.5 Doesn’t want to kill us all…

Sebastien Bubeck on X